On Efficient Measurement of the Impact of Hardware Errors in Computer Systems

نویسندگان

  • BEHROOZ SANGCHOOLIE
  • Behrooz Sangchoolie
چکیده

Technology and voltage scaling is making integrated circuits increasingly susceptible to failures caused by soft errors. The source of soft errors are temporary hardware faults that alter data and signals in digital circuits. Soft errors are predominately caused by ionizing particles, electrical noise and wear-out effects, but may also occur as a result of marginal circuit designs and manufacturing process variations. Modern computers are equipped with a range of hardware and software based mechanisms for detecting and correcting soft errors, as well as other types of hardware errors. While these mechanisms can handle a variety of errors and error types, protecting a computer completely from the effects of soft errors is technically and economically infeasible. Hence, in applications where reliability and data integrity is of primary concern, it is desirable to assess and measure the system’s ability to detect and correct soft errors. This thesis is devoted to the problem of measuring hardware error sensitivity of computer systems. We define hardware error sensitivity as the probability that a hardware error results in an undetected erroneous output. Since the complexity of computer systems makes it extremely demanding to assess the effectiveness of error handling mechanisms analytically, error sensitivity and related measures, e.g., error coverage, are in practice determined experimentally by means of fault injection experiments. The error sensitivity of a computer system depends not only on the design of its error handling mechanism, but also on the program executed by the computer. In addition, measurements of error sensitivity is affected by the experimental set-up, including how and where the errors are injected, and the assumptions about how soft errors are manifested, i.e., the error model. This thesis identifies and investigates six parameters, or sources of variation, that affect measurements of error sensitivity. These parameters consist of two subgroups, those that deal with systems characteristics, namely, (i) the input processed by a program, (ii) the program’s source code implementation, (iii) the level of compiler optimization; and those that deal with measurement setup, namely, (iv) the number of bits that are targeted in each experiment, (v) the target location in which faults are injected, (vi) the time of injection. To accurately measure the error sensitivity of a system, one needs to conduct several sets of fault injection experiments by varying different sources of variations. As these experiments are quite time-consuming, it is desirable to improve the efficiency of fault injection-based measurement of error sensitivity. To this end, the thesis proposes and evaluates different error space optimization and error space pruning techniques to reduce the time and effort needed to measure the error sensitivity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposing an Efficient Software-based Method to Enhance Reliability of Computer Systems against Soft Errors

In recent years, along with rapid developments in technology, computer systems haveincreasingly become more integrated and more modular. Indeed, the reliability and efficiency ofcomputer systems are of high significance. Hence, the quantitative evaluation of the optimizationof reliability indexes in computer systems is considered to be a crucial issue. Reliabilityenhancement of computer systems...

متن کامل

The impact of Cloud Computing in the banking industry resources

Today, one of the biggest problems that gripped the banking sphere, the high cost of implementing advanced technologies and the efficient use of the hardware. Cloud computing is the use of shared services on the Internet provides a large role in developing the banking system, without the need for operating expenses including staffing, equipment, hardware and software Reducing the cost of implem...

متن کامل

The impact of Cloud Computing in the banking industry resources

Today, one of the biggest problems that gripped the banking sphere, the high cost of implementing advanced technologies and the efficient use of the hardware. Cloud computing is the use of shared services on the Internet provides a large role in developing the banking system, without the need for operating expenses including staffing, equipment, hardware and software Reducing the cost of implem...

متن کامل

The Effect of Gauge Measurement Capability and Dependency Measure of Process Variables on the MCp

It has been proved that process capability indices provide very efficient measures of the capability of processes from many different perspectives. These indices have been widely used in the manufacturing industry for measuring process reproduction capability according to manufacturing specifications. In the past few years, univariate capability indices have been introduced and used to characte...

متن کامل

A CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images

Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017